SVD-based universal DNN modeling for multiple scenarios
نویسندگان
چکیده
Speech recognition scenarios (aka tasks) differ from each other in acoustic transducers, acoustic environments, and speaking style etc. Building one acoustic model per task is one common practice in industry. However, this limits training data sharing across scenarios thus may not give highest possible accuracy. Based on the deep neural network (DNN) technique, we propose to build a universal acoustic model for all scenarios by utilizing all the data together. Two advantages are obtained: 1) leveraging more data sources to improve the recognition accuracy, 2) reducing substantially service deployment and maintenance costs. We achieve this by extending the singular value decomposition (SVD) structure of DNNs. The data from all scenarios are used to first train a single SVD-DNN model. Then a series of scenario-dependent linear square matrices are added on top of each SVD layer and updated with only scenario-related data. At the recognition time, a flag indicates different scenarios and guides the recognizer to use the scenario-dependent matrices together with the scenarioindependent matrices in the universal DNN for acoustic score evaluation. In our experiments on Microsoft Winphone/Skype/Xbox data sets, the universal DNN model is better than traditional trained isolated models, with up to 15.5% relative word error rate reduction.
منابع مشابه
Restructuring of deep neural network acoustic models with singular value decomposition
Recently proposed deep neural network (DNN) obtains significant accuracy improvements in many large vocabulary continuous speech recognition (LVCSR) tasks. However, DNN requires much more parameters than traditional systems, which brings huge cost during online evaluation, and also limits the application of DNN in a lot of scenarios. In this paper we present our new effort on DNN aiming at redu...
متن کاملScenario-based modeling for multiple allocation hub location problem under disruption risk: multiple cuts Benders decomposition approach
The hub location problem arises in a variety of domains such as transportation and telecommunication systems. In many real-world situations, hub facilities are subject to disruption. This paper deals with the multiple allocation hub location problem in the presence of facilities failure. To model the problem, a two-stage stochastic formulation is developed. In the proposed model, the number of ...
متن کاملSpeeding up deep neural network based speech recognition systems
Recently, deep neural network (DNN) based acoustic modeling has been successfully applied to large vocabulary continuous speech recognition (LVCSR) tasks. A relative word error reduction around 20% can be achieved compared to a state-of-the-art discriminatively trained Gaussian Mixture Model (GMM). However, due to the huge number of parameters in the DNN, real-time decoding is a bottleneck for ...
متن کاملAn Investigation of Deep Neural Networks for Multilingual Speech Recognition Training and Adaptation
Different training and adaptation techniques for multilingual Automatic Speech Recognition (ASR) are explored in the context of hybrid systems, exploiting Deep Neural Networks (DNN) and Hidden Markov Models (HMM). In multilingual DNN training, the hidden layers (possibly extracting bottleneck features) are usually shared across languages, and the output layer can either model multiple sets of l...
متن کاملMulti-resolution stacking for speech separation based on boosted DNN
Recent progress in speech separation shows that deep neural networks (DNN) based supervised methods can improve the performance in difficult noise conditions and exhibit good generalization to unseen noise scenarios. However, existing approaches do not explore contextual information sufficiently. In this paper, we focus on exploring contextual information using DNN. The proposed method has two ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015